Gastrulation-stage gene expression in Nipbl+/− mouse embryos foreshadows the development of syndromic birth defects

In animal models, Nipbl deficiency phenocopies gene expression changes and birth defects seen in Cornelia de Lange syndrome, the most common cause of which is Nipbl haploinsufficiency. Previous studies in Nipbl+/− mice suggested that heart development is abnormal as soon as cardiogenic tissue is formed. To investigate this, we performed single-cell RNA sequencing on wild-type and Nipbl+/− mouse embryos at gastrulation and early cardiac crescent stages. Nipbl+/− embryos had fewer mesoderm cells than wild-type and altered proportions of mesodermal cell subpopulations. These findings were associated with underexpression of genes implicated in driving specific mesodermal lineages. In addition, Nanog was found to be overexpressed in all germ layers, and many gene expression changes observed in Nipbl+/− embryos could be attributed to Nanog overexpression. These findings establish a link between Nipbl deficiency, Nanog overexpression, and gene expression dysregulation/lineage misallocation, which ultimately manifest as birth defects in Nipbl+/− animals and Cornelia de Lange syndrome.

At each stage, we considered cells that exceeded three median absolute deviations from percent mitochondrial genes expressed, number of genes expressed, and number of transcripts detected per cell, low-quality cells and/or doublets and removed them.Histograms of (A) percent mitochondrial genes expressed, (B) number of genes expressed, and (C) number of transcripts detected per cell at LB-and CC-stage colored by whether cells were less than or equal to 3 median absolute deviations (red) or greater than 3 median absolute deviations (blue).We corrected batch effects among embryos by integrating cells from the same stage and genotype together.UMAP of WT and Nipbl +/-cells colored by embryo at (A) LB-and (B) CCstage.

Fig. S5. Example of optimized clustering using clustering tree and Shannon Entropy.
(A) Following consecutive clustering of WT cells from, we generated a clustering tree to visualize how cell cluster identities change as the number of clusters consecutively increase.Nodes represent clusters and edges represent cells from preceding clusters.Clusters are stable when a large proportion of cells are derived from a single preceding cluster rather than multiple preceding clusters.We adopted Shannon Entropy as a measure of these proportions as a measure of intra-cluster stability.A low Shannon Entropy represents high intra-cluster stability.Here, WT cells from LB-stage embryos can only be clustered as high as 5 clusters before they can be subclustered into further subclusters.WT cells from CC-stage embryos can only be clustered as high as 4 clusters before they can be subclustered into further clusters.(B) Total Shannon Entropy of all clusters among WT cells from LB-or CC-stage embryos at consecutively increasing number of clusters.At each stage, we considered cells that exceeded three median absolute deviations from percent mitochondrial genes expressed, number of genes expressed, and number of transcripts detected per cell low-quality cells and/or doublets and removed them.Histograms of (A) percent mitochondrial genes expressed, (B) number of genes expressed, and (C) number of transcripts detected per cell at EB-and EHF-stage colored by whether cells were less than or equal to 3 median absolute deviations (red) or greater than 3 median absolute deviations (blue).Cells colored in blue were removed.

Fig. S9. Batch effect correction and integration of cells of the same stage (EB-or EHFstage).
We removed batch effects among EB-and EHF-stage embryos of the same stage and integrated embryos of the same stage together.UMAP of EB-and EHF-stage cells colored by embryo.The germ layers of Nipbl +/-embryos do not substantially overexpress or underexpress genes associated with apoptosis.Heatmap of fold change in expression in genes from the (A) Reactome Apoptosis, (B) Hallmark Apoptosis, and (C) Langemeijer Apoptosis gene sets in germ layers of Nipbl +/-embryos from that of WT embryos.Genes are ordered from top to bottom by minimum Q-value.Q-value from Bonferroni corrected P-value from Mann-Whitney U test.

Fig. S12. Expression of meta-PCNA genes and cell cycle phase composition of Nipbl +/- embryos relative to WT.
The germ layers of Nipbl +/-embryos do not substantially overexpress or underexpress genes associated with cell proliferation.(A) Heatmap of fold change in expression in genes from the meta-PCNA gene set in germ layers of Nipbl +/-embryos from that of WT embryos.Genes are ordered from top to bottom by minimum Q-value.Q-value from Bonferroni corrected P-value from Mann-Whitney U test.(B) Principal component analysis was performed cells from each germ layer of WT and Nipbl +/-embryos using cell cycle phase genes marking S phase and G2/M phase.Seurat was used to assign cells into G1, S, or G2/M phase based on the expression of these markers.Expression of these marker sets are considered anticorrelated.When cells express neither, they are considered to be in G1 phase.In all germ layers, cells from WT and Nipbl +/- embryos assigned into all three phases.(C) Percentage of cells in each cell cycle phase across all germ layers of WT and Nipbl +/-embryos.There was no statistically significant difference in the percentage of cells in each cell cycle phase between that of WT and Nipbl +/-embryos.Error bars show standard error of the mean.Percentage of differentially expressed genes (Q < 0.05, Mann-Whitney U Test) in (A) mesoderm, (B) ectoderm, and (C) endoderm of LB-stage Nipbl +/-embryos resulting from the projection of WT cells onto Nipbl +/-germ layers (reverse projection) matching those differentially expressed in the germ layers of Nipbl +/-embryos resulting from the projection of Nipbl +/-cells onto WT germ layers (forward projection).Fold change in expression of differentially expressed genes (Q < 0.05, Mann-Whitney U Test) in the (D) mesoderm, (E) ectoderm, and (F) endoderm of LBstage Nipbl +/-embryos following projection of WT cells onto Nipbl +/-germ layers (reverse projection) versus the fold change in expression of the same genes in the germ layers of WT embryos following projection of Nipbl +/-cells onto WT germ layers (forward projection).(A) Fold change in expression of genes in CC-stage Nipbl +/-embryos that were differentially expressed (Q < 0.05, Mann-Whitney U Test) from WT embryos along with their average expression in WT embryos.Genes that are were also differentially expressed (Q < 0.05, T-Test) in E9.5 mouse embryos transgenic for doxycycline inducible expression of Nanog (Nanog Dox+) from that of WT embryos (Nanog Dox-) from ( 113) are colored dark red (upregulated) and dark blue (downregulated).(B) Percentage of genes in CC-stage Nipbl +/-embryos that were differentially expressed (Q < 0.05, Mann-Whitney U Test) from WT embryos, and that were also differentially expressed in the same direction in E9.5 Nanog Dox+ embryos.At CC-stage, 57% of overexpressed and 63% of underexpressed genes in Nipbl +/-embryos were also overexpressed or underexpressed, respectively, in E9.5 Nanog Dox+ embryos.(C) Fold change in expression of genes in the germ layers of CC-stage Nipbl +/-embryos that were differentially expressed (Q < 0.05, Mann-Whitney U Test) from WT embryos along with their average expression in WT embryos.Genes that are were also differentially expressed (Q < 0.05, T-Test) in E9.5 Nanog Dox+ embryos are colored dark red and dark blue.At CC-stage, all germ layers of Nipbl +/- embryos showed downregulation of genes associated with definitive erythropoiesis, encompassing adult hemoglobins including Hbb-y, Hba-a1, and Hba-x (data S41, data S42, and data S43).(D) Fold change in expression of genes in CC-stage Nipbl +/-embryos versus their fold change in E9.5 Nanog Dox+ embryos.Genes differentially expressed (Q < 0.05, T-Test) in E9.5 Nanog Dox+ embryos are colored dark red and dark blue.At CC-stage, the slope of the regression line comparing misexpression in Nipbl +/-and E9.5 Nanog Dox+ embryos was found to be 0.3 (panel D).Additionally, at CC-stage, Nipbl +/-embryos upregulated 80% of the same genes that were overexpressed in E7.5 Nanog Dox+ embryos and downregulated 41% of the same genes underexpressed in E7.5 Nanog Dox+ embryos.(E) Heatmap of fold change in expression in germ layers of CC-stage Nipbl +/-embryos from that of WT embryos of the most differentially expressed genes from E9.5 Nanog Dox+ embryos from that of Nanog Doxembryos (lowest Q-values from T-Test) from (113).Genes are ordered from top to middle (overexpressed in Nanog Dox+ embryos) and bottom to middle (underexpressed in Nanog Dox+ embryos) by lowest to higher Q-value.Underexpressed genes in Nanog Dox+ embryos were strongly mirrored in the germ layers of CC-stage Nipbl +/-embryos (panel E).In CC-stage Nipbl +/-embryos, only a few overexpressed genes observed in Nanog Dox+ embryos were replicated, the most prominent of which was Nanog itself.

Figure S2 .
Figure S2.Metrics used to identify low-quality cells and doublets among LB-and CC-stage embryos.

Fig. S4 .
Fig. S4.Batch effect correction and integration of cells of the same stage (LB-or CC-stage) and genotype (WT or Nipbl +/-).

Fig. S6 .
Fig. S6.Percentage of cells in germ layers and mesodermal subpopulations of LB-stage embryos following projection of WT cells onto clustered Nipbl +/-cells.

Figure S8 .
Figure S8.Metrics used to identify low-quality cells and doublets among EB-and EHFstage embryos.

Fig. S10 .
Fig. S10.Clustering of EB-and EHF-stage cells from WT embryos and annotation thereof as either ectoderm, mesoderm, or endoderm.

Fig. S13 .
Fig. S13.Paraxial mesoderm lineage of LB-stage Nipbl +/-embryos, showing underexpression of paraxial mesoderm anti-drivers.Fold change in expression of genes in mesoderm cells of PM lineage of LB-stage Nipbl +/-embryos differentially expressed (Q < 0.05, Mann-Whitney U Test) from that of WT embryos along Spearman's Rank Correlation coefficient from Fig. 5E.
Mann-Whitney U test.P-values were adjusted for multiple hypotheses by Bonferroni Correction.Cluster = cluster number, Gene = gene, Pct_Expr_Cluster = percentage of cells in cluster expressing indicated gene, Pct_Expr_Other = percentage of cells in all other clusters expressing indicated gene, Avg_FC = average fold change in expression of indicated gene in cluster versus all other clusters, P_Val = p-value from Mann-Whitney U test, Bon_P_Val = Bonferroni corrected p-value, Germ_Layer = germ layer assigned to cluster.Data S2.Differentially expressed genes among clusters of CC-stage WT embryos.Differential gene expression analysis was performed between clusters of CC-stage WT embryos using the Mann-Whitney U test.P-values were adjusted for multiple hypotheses by Bonferroni Correction.Cluster = cluster number, Gene = gene, Pct_Expr_Cluster = percentage of cells in cluster expressing indicated gene, Pct_Expr_Other = percentage of cells in all other clusters expressing indicated gene, Avg_FC = average fold change in expression of indicated gene in cluster versus all other clusters, P_Val = p-value from Mann-Whitney U test, Bon_P_Val = Bonferroni corrected p-value, Germ_Layer = germ layer assigned to cluster.Data S3.Differentially expressed genes among endodermal clusters of LB-stage WT embryos.Differential gene expression analysis was performed between endodermal clusters of LB-stage WT embryos using the Mann-Whitney U test.P-values were adjusted for multiple hypotheses by Bonferroni Correction.Cluster = cluster number, Gene = gene, Pct_Expr_Cluster = percentage of cells in cluster expressing indicated gene, Pct_Expr_Other = percentage of cells in all other clusters expressing indicated gene, Avg_FC = average fold change in expression of indicated gene in cluster versus all other clusters, P_Val = p-value from Mann-Whitney U test, Bon_P_Val = Bonferroni corrected p-value, Cell_Type = cell type assigned to cluster, Transcription_Factor = Y, predicted by Animal Transcription Factor Database to be a transcription factor.Data S4.Differentially expressed genes among endodermal clusters of CC-stage WT embryos.Differential gene expression analysis was performed between endodermal clusters of CC-stage WT embryos using the Mann-Whitney U test.P-values were adjusted for multiple hypotheses by Bonferroni Correction.Cluster = cluster number, Gene = gene, Pct_Expr_Cluster = percentage of cells in cluster expressing indicated gene, Pct_Expr_Other = percentage of cells in all other clusters expressing indicated gene, Avg_FC = average fold change in expression of indicated gene in cluster versus all other clusters, P_Val = p-value from Mann-Whitney U test, Bon_P_Val = Bonferroni corrected p-value, Cell_Type = cell type assigned to cluster, Transcription_Factor = Y, predicted by Animal Transcription Factor Database to be a transcription factor.Data S5.Differentially expressed genes among ectodermal clusters of LB-stage WT embryos.Differential gene expression analysis was performed between ectodermal clusters of LB-stage WT embryos using the Mann-Whitney U test.P-values were adjusted for multiple hypotheses by Bonferroni Correction.Cluster = cluster number, Gene = gene, Pct_Expr_Cluster = percentage of cells in cluster expressing indicated gene, Pct_Expr_Other = percentage of cells in all other clusters expressing indicated gene, Avg_FC = average fold change in expression of indicated gene in cluster versus all other clusters, P_Val = p-value from Mann-Whitney U test, Bon_P_Val = Bonferroni corrected p-value, Cell_Type = cell type assigned to cluster, Transcription_Factor = Y, predicted by Animal Transcription Factor Database to be a transcription factor.Data S6.Differentially expressed genes among ectodermal clusters of CC-stage WT embryos.Differential gene expression analysis was performed between ectodermal clusters of CC-stage WT embryos using the Mann-Whitney U test.P-values were adjusted for multiple hypotheses by Bonferroni Correction.Cluster = cluster number, Gene = gene, Pct_Expr_Cluster = percentage of cells in cluster expressing indicated gene, Pct_Expr_Other = percentage of cells in all other clusters expressing indicated gene, Avg_FC = average fold change in expression of indicated gene in cluster versus all other clusters, P_Val = p-value from Mann-Whitney U test, Bon_P_Val = Bonferroni corrected p-value, Cell_Type = cell type assigned to cluster, Transcription_Factor = Y, predicted by Animal Transcription Factor Database to be a transcription factor.Data S7.Differentially expressed genes among mesodermal clusters of LB-stage WT embryos.Differential gene expression analysis was performed between mesodermal clusters of LB-stage WT embryos using the Mann-Whitney U test.P-values were adjusted for multiple hypotheses by Bonferroni Correction.Cluster = cluster number, Gene = gene, Pct_Expr_Cluster = percentage of cells in cluster expressing indicated gene, Pct_Expr_Other = percentage of cells in all other clusters expressing indicated gene, Avg_FC = average fold change in expression of indicated gene in cluster versus all other clusters, P_Val = p-value from Mann-Whitney U test, Bon_P_Val = Bonferroni corrected p-value, Cell_Type = cell type assigned to cluster, Transcription_Factor = Y, predicted by Animal Transcription Factor Database to be a transcription factor.Data S8.Differentially expressed genes among mesodermal clusters of CC-stage WT embryos.Differential gene expression analysis was performed between mesodermal clusters of CC-stage WT embryos using the Mann-Whitney U test.P-values were adjusted for multiple hypotheses by Bonferroni Correction.Cluster = cluster number, Gene = gene, Pct_Expr_Cluster = percentage of cells in cluster expressing indicated gene, Pct_Expr_Other = percentage of cells in all other clusters expressing indicated gene, Avg_FC = average fold change in expression of indicated gene in cluster versus all other clusters, P_Val = p-value from Mann-Whitney U test, Bon_P_Val = Bonferroni corrected p-value, Cell_Type = cell type assigned to cluster, Transcription_Factor = Y, predicted by Animal Transcription Factor Database to be a transcription factor.Data S9.Numbers of cells per germ layer per LB-stage embryo.Genotype = genotype of embryo, Embryo = embryo identifier, Germ_Layer = germ layer of embryo, No_Cells = number of cells in germ layer.Data S10.Numbers of cells per mesodermal cell population per LB-stage embryo.Genotype = genotype of embryo, Embryo = embryo identifier, Cell_Type = cell type of embryo, No_Cells = number of cells in cell type.Data S11.Differentially expressed genes among clusters of LB-stage Nipbl +/-embryos.Differential gene expression analysis was performed between clusters of LB-stage Nipbl +/- embryos using the Mann-Whitney U test.P-values were adjusted for multiple hypotheses by Bonferroni Correction.Cluster = cluster number, Gene = gene, Pct_Expr_Cluster = percentage of cells in cluster expressing indicated gene, Pct_Expr_Other = percentage of cells in all other clusters expressing indicated gene, Avg_FC = average fold change in expression of indicated gene in cluster versus all other clusters, P_Val = p-value from Mann-Whitney U test, Bon_P_Val = Bonferroni corrected p-value, Germ_Layer = germ layer assigned to cluster.Data S12.Differentially expressed genes among mesodermal clusters of LB-stage Nipbl +/- embryos.Differential gene expression analysis was performed between mesodermal clusters of LB-stageNipbl +/-embryos using the Mann-Whitney U test.P-values were adjusted for multiple hypotheses by Bonferroni Correction.Cluster = cluster number, Gene = gene, Pct_Expr_Cluster = percentage of cells in cluster expressing indicated gene, Pct_Expr_Other = percentage of cells in all other clusters expressing indicated gene, Avg_FC = average fold change in expression of indicated gene in cluster versus all other clusters, P_Val = p-value from Mann-Whitney U test, Bon_P_Val = Bonferroni corrected p-value, Cell_Type = cell type assigned to cluster.Data S13.Numbers of cells per germ layer per LB-stage embryo following projection of WT cells onto Nipbl+/-germ layers.Genotype = genotype of embryo, Embryo = embryo identifier, Germ_Layer = germ layer of embryo, No_Cells = number of cells in germ layer.Data S14.Numbers of cells per mesodermal cell population per LB-stage embryo following projection of WT cells onto Nipbl +/-mesodermal cell populations.Genotype = genotype of embryo, Embryo = embryo identifier, Cell_Type = cell type of embryo, No_Cells = number of cells in cell type.Data S15.Differentially expressed genes among clusters of EB-stage WT embryos.Differential gene expression analysis was performed between clusters of EB-stage WT embryos using the Mann-Whitney U test.P-values were adjusted for multiple hypotheses by Bonferroni Correction.Cluster = cluster number, Gene = gene, Pct_Expr_Cluster = percentage of cells in cluster expressing indicated gene, Pct_Expr_Other = percentage of cells in all other clusters expressing indicated gene, Avg_FC = average fold change in expression of indicated gene in cluster versus all other clusters, P_Val = p-value from Mann-Whitney U test, Bon_P_Val = Bonferroni corrected p-value, Germ_Layer = germ layer assigned to cluster.Data S16.Differentially expressed genes among clusters of EHF-stage WT embryos.Differential gene expression analysis was performed between clusters of EHF-stage WT embryos using the Mann-Whitney U test.P-values were adjusted for multiple hypotheses by Bonferroni Correction.Cluster = cluster number, Gene = gene, Pct_Expr_Cluster = percentage of cells in cluster expressing indicated gene, Pct_Expr_Other = percentage of cells in all other clusters expressing indicated gene, Avg_FC = average fold change in expression of indicated gene in cluster versus all other clusters, P_Val = p-value from Mann-Whitney U test, Bon_P_Val = Bonferroni corrected p-value, Germ_Layer = germ layer assigned to cluster.Data S17.Pseudotime values of cells from EB-, LB-, and CC-stage embryos.Pseudotime of cells from EB-, LB-, and CC-stage embryos were calculated using URD.Stage = stage of embryo, Genotype = genotype of embryo, Embryo = identifier of embryo, Cell_Barcode = barcode identifying cell, Cluster = cluster number, Germ_Layer = germ layer assigned to cell, Pseudotime = pseudotime value of cell.Data S18.RNA velocity vectors of mesoderm cells from LB-stage WT embryos.scVelo was used to calculate the RNA velocities of mesoderm cells from LB-stage WT embryos.Embryo = embryo identifier, Cell_Barcode = barcode identifying cell, UMAP_1 = first coordinate in UMAP, UMAP_2 = second coordinate in UMAP, Cluster = cluster number, Cell_Type = cell type assigned to cell, Velocity_1 = first coordinate of RNA velocity vector, Velocity_2 = second coordinate of RNA velocity vector.Data S19.RNA velocity vectors of mesoderm cells from LB-stage Nipbl +/-embryos.scVelo was used to calculate the RNA velocities of mesoderm cells from LB-stage Nipbl +/- embryos.Embryo = embryo identifier, Cell_Barcode = barcode identifying cell, UMAP_1 = first coordinate in UMAP, UMAP_2 = second coordinate in UMAP, Cluster = cluster number, Cell_Type = cell type assigned to cell, Velocity_1 = first coordinate of RNA velocity vector, Velocity_2 = second coordinate of RNA velocity vector.Data S20.Fate probabilities of mesoderm cells into first heart field, second heart field, and paraxial mesoderm fates from LB-stage WT embryos.CellRank was used to calculate the fate probabilities of mesoderm cells into first heart field, second heart field, and paraxial mesoderm fates from LB-stage WT embryos.Embryo = embryo identifier, Cell_Barcode = barcode identifying cell, UMAP_1 = first coordinate in UMAP, UMAP_2 = second coordinate in UMAP, Cluster = cluster number, Cell_Type = cell type assigned to cell, FHF_Prob = first heart field fate probability, SHF_Prob = second heart field fate probability, PM_Prob = paraxial mesoderm fate probability.Data S21.Fate probabilities of mesoderm cells into first heart field, second heart field, and paraxial mesoderm fates from LB-stage Nipbl +/-embryos.CellRank was used to calculate the fate probabilities of mesoderm cells into first heart field, second heart field, and paraxial mesoderm fates from LB-stage Nipbl +/-embryos.Embryo = embryo identifier, Cell_Barcode = barcode identifying cell, UMAP_1 = first coordinate in UMAP, UMAP_2 = second coordinate in UMAP, Cluster = cluster number, Cell_Type = cell type assigned to cell, FHF_Prob = first heart field fate probability, SHF_Prob = second heart field fate probability, PM_Prob = paraxial mesoderm fate probability.Data S22.Differentially expressed genes from Reactome, Hallmark, and Langemeijer Apoptosis Gene Set between germ layers of LB-stage WT and Nipbl +/-embryos.Differential gene expression analysis was performed using genes from the Reactome, Hallmark, and Langemeijer Apoptosis Gene Set between the germ layers of LB-stage WT and Nipbl +/- embryos using the Mann-Whitney U test.P-values were adjusted for multiple hypotheses by Bonferroni Correction.Gene = gene, Avg_Expr_Wildtype = average expression of indicated gene in Wildtype cells, Avg_Expr_Nipbl+/-= average expression of indicated gene in Nipbl +/- cells, SE_Wildtype = standard error of average expression in Wildtype cells, SE_Nipbl+/-= standard error of average expression in Nipbl +/-cells, Avg_FC = average fold change in expression of indicated gene in Nipbl +/-cells versus WT cells, P_Val = p-value from Mann-Whitney U test, Bon_P_Val = Bonferroni corrected p-value.
Table shows stage of each embryo, genotype of each embryo, number of cells captured per embryo, median number of UMIs captured per cell, and median number of genes detected per cell.

Table S8 . Number of cells, transcripts, and genes captured by scRNAseq of EB-and EHF- stage WT embryos. scRNA
-seq of EB-and EHF-stage of WT mouse embryos captured thousands of cells per embryo, thousands of transcripts per cell, and detected thousands of genes per cell.Table shows genotype of each embryo, number of cells captured per embryo, median number of UMIs captured per cell, and median number of genes detected per cell.